Video encoding and decoding methods and corresponding encoding and decoding devices

ABSTRACT

An encoding method applied to an input video sequence corresponding to successive scenes subdivided into successive video object planes (VOPs) is provided that generates, for coding all the video objects of said scenes, a coded bit stream the content of which is described in terms of separate channels and constituted of encoded video data in which each data item is described by a bitstream syntax that allows the recognition and decoding of all the elements of the content. The syntax comprises an additional syntactic information provided for describing independently the type of temporal prediction of the various channels. The additional information is a syntactic element placed at the slice level or the macroblock level in the coded bitstream, and its meaning is either specific for each present channel or shared by all existing channels.

PRIORITY CLAIMS/RELATED APPLICATIONS

This application is a 371 U.S. national stage filing of (and claims thebenefit and priority under 35 USC 119 and 120 to) PCT/IB2004/00 1373filed on Apr. 28, 2004 which in turn claims the benefit and priorityunder 35 USC 119 to European Patent Application Serial No. EP 03300011.8filed on May 6, 2003, the entirety of both of which are incorporated byreference herein.

FIELD OF THE INVENTION

The present invention generally relates to the field of videocompression and, for instance, to the video coding standards of the MPEGfamily (MPEG-1, MPEG-2, MPEG-4) and to the video recommendations of theITU-H.26X family (H.261, H.263 and extensions, H.264). Morespecifically, this invention relates to an encoding method applied to aninput video sequence corresponding to successive scenes subdivided intosuccessive video object planes (VOPs) and generating, for coding all thevideo objects of said scenes, a coded bitstream the content of which isdescribed in terms of separate channels and constituted of encoded videodata in which each data item is described by means of a bitstream syntaxallowing to recognize and decode all the elements of said content, saidsyntax comprising an additional syntactic information provided fordescribing independently the type of temporal prediction of the variouschannels, said predictions being chosen within a list comprising thefollowing situations:

-   -   the temporal prediction is formed by directly applying the        motion field sent by the encoder on one or more reference        pictures;    -   the temporal prediction is a copy of a reference image;    -   the temporal prediction is formed by the temporal interpolation        of the motion field;    -   the temporal prediction is formed by the temporal interpolation        of the current motion field and further refined by the motion        field sent by the encoder.

The invention also relates to a corresponding encoding device, to atransmittable video signal consisting of a coded bitstream generated bysuch an encoding device, and to a method and a device for decoding avideo signal consisting of such a coded bitstream.

BACKGROUND OF THE INVENTION

In the first video coding standards and recommendations (up to MPEG-4and H.264), the video was assumed to be rectangular and to be describedin terms of a luminance channel and two chrominance channels. WithMPEG-4, an additional channel carrying shape information has beenintroduced. Two modes are available to compress those channels: theINTRA mode, according to which each channel is encoded by exploiting thespatial redundancy of the pixels in a given channel of a single image,and the INTER mode, that exploits the temporal redundancy betweenseparate images. The INTER mode relies on a motion-compensationtechnique, which describes an image from one or several image(s)previously decoded by encoding the motion of pixels from one image tothe other. Usually, the image to be encoded is partitioned intoindependent blocks or macroblocks, each of them being assigned a motionvector. A prediction of the image is then constructed by displacingpixel blocks from the reference image(s) according to the set of motionvectors (luminance and chrominance channels share the same motiondescription). Finally, the difference (called the residual signal)between the image to be encoded and its motion-compensated prediction isencoded in the INTER mode to further refine the decoded image. However,the fact that all pixel channels are described by the same motioninformation is a limitation damaging the compression efficiency of thevideo coding system.

SUMMARY OF THE INVENTION

It is therefore the object of the invention to propose a video encodingmethod in which said drawback is avoided by adapting the way thetemporal prediction is formed.

To this end, the invention relates to a method such as defined in theintroductory part of the description and which is moreover characterizedin that said additional syntactic information is a syntactic elementplaced in said generated coded bitstream and its meaning is specific foreach present channel, said element being placed at the slice level or atthe macroblock level according to the proposed embodiment.

The invention also relates to a corresponding encoding device, to atransmittable video signal consisting of a coded bitstream generated bysuch an encoding device, and to a method and a device for decoding avideo signal consisting of such a coded bitstream.

DETAILED DESCRIPTION OF THE INVENTION

According to the invention, it is proposed to introduce in the encodingsyntax used by the video standards and recommendations an additionalinformation consisting of a new syntactic element supporting their lackof flexibility and opening new possibilities to encode more efficientlyand independently the temporal prediction of various channels. Thisadditional syntactic element, called for example “channel temporalprediction”, takes the following symbolic values:

Motion_compensation

Temporal_copy

Temporal_interpolation

Motion_compensated_temporal_interpolation,

and the meaning of these values is:

a) motion_compensation: the temporal prediction is formed by directlyapplying the motion field sent by the encoder on one or more referencepictures (this default mode is implicitly the INTER coding mode of mostof the current coding systems);

b) temporal_copy: the temporal prediction is a copy of a referenceimage;

c) temporal_interpolation: the temporal prediction is formed by thetemporal interpolation of the motion fields;

d) motion_compensated_temporal_interpolation: the temporal prediction isformed by the temporal interpolation of the current motion field andfurther refined by the motion field sent by the encoder.

The words “temporal interpolation” must be understood in a broad sense,i.e. as meaning any operation of the type defined by an expression suchas Vnew=a·V1+b·V2+K, where V1 and V2 designate previously decoded motionfields, a and b designate coefficients respectively assigned to saidmotion fields, K designates an offset and Vnew is the new motion fieldthus obtained. It can therefore be seen that, in fact, the particularcase of the temporal copy is included in the more general case of thetemporal interpolation, for b=0 and K=0 (or a=0 and K=0).

According to the invention, the additional syntactic element thusproposed has to be placed at the following levels in the coded bitstreamthat has to be stored (or to be transmitted to the decoding side):

1) either at the slice level;

2) or at the macroblock level;

this additional syntactic element being in each case either specific foreach present channel or, possibly, shared by all the channels.

This invention may be used in some identified situations where the wayof constructing the temporal prediction can be switched on a slice ormacroblock basis, and also on a channel basis. A first example may befor instance a sequence with a shape channel: it is possible that theshape information does not change much, whereas the luminance andchrominance channels carry varying information (it is for instance thecase with a video depicting a rotating planet: the shape is always adisc, but the texture of it depends on the planet rotation). In thissituation, the shape channel can be recovered by temporal copy, and theluminance and chrominance channels by motion compensated temporalinterpolation. A second example may be the case of a change at themacroblock level. In a video sequence showing a seascape with the sky inthe upper part of the picture, unlike the sea, the sky remains the samefrom one image to the other. Its macroblocks can therefore be encoded bytemporal copy, whereas the macroblocks of the sea have to be encoded bytemporal interpolation.

The invention claimed is:
 1. An encoding method applied to an inputvideo sequence corresponding to successive scenes subdivided intosuccessive video object planes (VOPs), the method comprising: generatinga coded bitstream for coding all the video objects of said scenes, thecontent of the coded bitstream is described in terms of separatechannels and constituted of encoded video data in which each data itemis described by means of a bitstream syntax allowing recognition anddecoding of all the elements of said content, wherein the codedbitstream further comprises additional syntactic information providedfor describing independently for each channel the type of temporalprediction for that channel, the type of temporal predictions beingchosen from a list comprising the following situations: the temporalprediction is formed by directly applying the motion field sent by theencoder on one or more reference pictures; the temporal prediction is acopy of a reference image; the temporal prediction is formed by thetemporal interpolation of the motion field; the temporal prediction isformed by the temporal interpolation of the current motion field andfurther refined by the motion field sent by the encoder; and wherein theadditional syntactic information is a syntactic element placed at aselected level in said generated coded bitstream and its meaning isspecific for each present channel.
 2. The method of claim 1, wherein theselected level is a macroblock level.
 3. An encoding method according toclaim 1, characterized in that said meaning is shared by all existingchannels.
 4. An encoding device, comprising: means for processing aninput video sequence that corresponds to successive scenes subdividedinto successive video object planes (YOPs); and means for generating acoded bitstream, the content of the coded bitstream is described interms of separate channels and constituted of encoded video data inwhich each data item is described by means of a bitstream syntaxallowing recognition and decoding of all the elements of said content,wherein the coded bitstream further comprises additional syntacticinformation provided for describing independently for each channel thetype of temporal prediction for that channel, the type of temporalpredictions being chosen from a list comprising the followingsituations: the temporal prediction is formed by directly applying themotion field sent by the encoder on one or more reference pictures; thetemporal prediction is a copy of a reference image; the temporalprediction is formed by the temporal interpolation of the motion field;the temporal prediction is formed by the temporal interpolation of thecurrent motion field and further refined by the motion field sent by theencoder; and wherein the additional syntactic information is a syntacticelement placed at a selected level in said generated coded bitstream andits meaning is specific for each present channel.
 5. The method of claim1, wherein the selected level is a slice level.
 6. The method of claim 1further comprising encoding a shape channel using the temporalprediction that is a copy of a reference image and encoding a luminancechannel and a chrominance channel using motion compensated temporalinterpolation.
 7. The method of claim 1 further comprising encoding afirst portion of the video input sequence using the temporal predictionthat is a copy of a reference image and encoding a second portion of theof the video input sequence using motion compensated temporalinterpolation.